Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation
نویسندگان
چکیده
In this paper the performance of bagging in classification problems is theoretically analysed, using a framework developed in works by Tumer and Ghosh and extended by the authors. A bias-variance decomposition is derived, which relates the expected misclassification probability attained by linearly combining classifiers trained on N bootstrap replicates of a fixed training set to that attained by a single bootstrap replicate of the same training set. Theoretical results show that the expected misclassification probability of bagging has the same bias component as a single bootstrap replicate, while the variance component is reduced by a factor N . Experimental results show that the performance of bagging as a function of the number of bootstrap replicates follows quite well our theoretical prediction. It is finally shown that theoretical results derived for bagging also apply to other methods for constructing multiple classifiers based on randomisation, such as the random subspace method and tree randomisation.
منابع مشابه
ارتقای کیفیت دستهبندی متون با استفاده از کمیته دستهبند دو سطحی
Nowadays, the automated text classification has witnessed special importance due to the increasing availability of documents in digital form and ensuing need to organize them. Although this problem is in the Information Retrieval (IR) field, the dominant approach is based on machine learning techniques. Approaches based on classifier committees have shown a better performance than the others. I...
متن کاملCombining Bias and Variance Reduction Techniques for Regression Trees
Gradient Boosting and bagging applied to regressors can reduce the error due to bias and variance respectively. Alternatively, Stochastic Gradient Boosting (SGB) and Iterated Bagging (IB) attempt to simultaneously reduce the contribution of both bias and variance to error. We provide an extensive empirical analysis of these methods, along with two alternate bias-variance reduction approaches — ...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملUsing Data Mining Models for Differential Diagnosis of Iron Deficiency Anemia and β-thalassemia Minor
Introduction: One of the most common types of anemia is Iron deficiency anemia that its main differential diagnosis is β-thalassemia minor. The rapid and accurate screening of β-thalassemia minor has particular importance for pre-marriage medical counseling and the prevention of the birth of neonates with β-thalassemia major and differentiating it from iron deficiency anemia to avoid unnecessar...
متن کاملUsing Data Mining Models for Differential Diagnosis of Iron Deficiency Anemia and β-thalassemia Minor
Introduction: One of the most common types of anemia is Iron deficiency anemia that its main differential diagnosis is β-thalassemia minor. The rapid and accurate screening of β-thalassemia minor has particular importance for pre-marriage medical counseling and the prevention of the birth of neonates with β-thalassemia major and differentiating it from iron deficiency anemia to avoid unnecessar...
متن کامل